5 research outputs found
DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On without 3D Modeling
We introduce DreamPaint, a framework to intelligently inpaint any e-commerce
product on any user-provided context image. The context image can be, for
example, the user's own image for virtual try-on of clothes from the e-commerce
catalog on themselves, the user's room image for virtual try-on of a piece of
furniture from the e-commerce catalog in their room, etc. As opposed to
previous augmented-reality (AR)-based virtual try-on methods, DreamPaint does
not use, nor does it require, 3D modeling of neither the e-commerce product nor
the user context. Instead, it directly uses 2D images of the product as
available in product catalog database, and a 2D picture of the context, for
example taken from the user's phone camera. The method relies on few-shot fine
tuning a pre-trained diffusion model with the masked latents (e.g., Masked
DreamBooth) of the catalog images per item, whose weights are then loaded on a
pre-trained inpainting module that is capable of preserving the characteristics
of the context image. DreamPaint allows to preserve both the product image and
the context (environment/user) image without requiring text guidance to
describe the missing part (product/context). DreamPaint also allows to
intelligently infer the best 3D angle of the product to place at the desired
location on the user context, even if that angle was previously unseen in the
product's reference 2D images. We compare our results against both text-guided
and image-guided inpainting modules and show that DreamPaint yields superior
performance in both subjective human study and quantitative metrics
Quilt-1M: One Million Image-Text Pairs for Histopathology
Recent accelerations in multi-modal applications have been made possible with
the plethora of image and text data available online. However, the scarcity of
analogous data in the medical field, specifically in histopathology, has halted
comparable progress. To enable similar representation learning for
histopathology, we turn to YouTube, an untapped resource of videos, offering
hours of valuable educational histopathology videos from expert
clinicians. From YouTube, we curate Quilt: a large-scale vision-language
dataset consisting of image and text pairs. Quilt was automatically
curated using a mixture of models, including large language models, handcrafted
algorithms, human knowledge databases, and automatic speech recognition. In
comparison, the most comprehensive datasets curated for histopathology amass
only around K samples. We combine Quilt with datasets from other sources,
including Twitter, research papers, and the internet in general, to create an
even larger dataset: Quilt-1M, with M paired image-text samples, marking it
as the largest vision-language histopathology dataset to date. We demonstrate
the value of Quilt-1M by fine-tuning a pre-trained CLIP model. Our model
outperforms state-of-the-art models on both zero-shot and linear probing tasks
for classifying new histopathology images across diverse patch-level
datasets of different sub-pathologies and cross-modal retrieval tasks
Deep Learning of Micro-Doppler Features for Aided and Unaided Gait Recognition
IEEE Radar Conference (RadarConf) (2017 : Seattle, WA)Remote health monitoring is a topic that has gained increased interest as a way to improve the quality and reduce costs of health care, especially for the elderly. Falling is one of the leading causes for injury and death among the elderly, and gait recognition can be used to detect and monitor neuromuscular diseases as well as emergency events such as heart attack and seizures. In this work, the potential for radar to discriminate a large number of classes of human aided and unaided motion is demonstrated. Deep learning of micro-Doppler features is used with a 3-layer auto-encoder structure to achieve 89% correct classification, a 17% improvement in performance over the benchmark support vector machine classifier supplied with 127 pre-defined features